Reconstructing intelligible audio speech from visual speech features

نویسندگان

  • Thomas Le Cornu
  • Ben P. Milner
چکیده

This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech features. The proposed method aims to estimate a spectral envelope from visual features which is then combined with an artificial excitation signal and used within a model of speech production to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lip2AudSpec: Speech reconstruction from silent lip movements video

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogra...

متن کامل

Czech Audio-Visual Speech Synthesis with an HMM-trained Speech Database and Enhanced Coarticulation

The task of visual speech synthesis is usually solved by concatenation of basic speech units selected from a visual speech database. Acoustical part is carried out separately using similar method. There are two main problems in this process. The first problem is a design of a database, that means estimation of the database parameters for all basic speech units. Second problem is a way how to co...

متن کامل

Seeing to hear better: evidence for early audio-visual interactions in speech identification.

Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in no...

متن کامل

Joint Audio-Visual Unit Selection – the JAVUS Speech Synthesizer

The author presents a system for speech synthesis that selects and concatenates speech segments (units) of various size from an adequately prepared audio-visual speech database. The audio and the video track of selected segments are used together in concatenation to preserve audio-visual correlations. The input text is converted into a target phone chain and the database is searched for appropr...

متن کامل

Enhancing audio speech using visual speech features

This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015